A Combined Mathematical Treatment for a Special Automatic Music Transcription System

نویسندگان

Yi Guo

Jiyong Tang

Xinguang Zhang

چکیده

and Applied Analysis 3 i Preprocessing: it is the first step in training process. It removes the silent part in the front and back of the notes section first and then through a windowed DFT transformation. ii Normalization: signal normalizationmeans converting the signal after preprocessing into equivalent one with mean 0 and variance 1, and it is used to facilitate the follow-up treatment. The signal is to be separated into two parts after normalization. One will go into PCA step, and the other will be used to analyze the harmonic structure. iii PCA: PCAmeans principal component analysis. It makes all the notes together by a common preextraction of principal components. That is to map all the notes from a higher-dimensional space to a relatively low-dimensional space coordinate system. While PCA is used to train date here, we can define a PCA score Ipca , whichmeans that we only retain the components whose sum of ratios is greater than the score. iv Calculate Hn: in this step we need to decide the number of harmonics for each note and the amplitude of spectrum for each harmonic. In fact, each note has clear harmonic characteristics. It means that there will be clear impulse near integer multiples of the fundamental frequency. The same notes played by different instruments have harmonics at almost the same location but the margins of each note may be different. Using this feature, we can roughly estimate a certain audio whether contains a note. This feature can be used to select candidates for the fundamental frequency to facilitate follow-up treatment. v Enframe: the length of the music to be detected is variable; thus, an enframe processing is needed in order to obtain a more accurate detection and estimation of the notes. Each frame will have a detect result, which shows the notes that the frame included. vi Frequency primary selection: in order to reduce the computation and improve the estimation accuracy of data, before the multiple fundamental frequency estimation step, a frequency primary selection step needs to be produced. In this step, all candidate notes are estimated. After this, it only needs to select the final notes from these candidate ones instead of from all the notes, which can greatly benefit the speed and estimation accuracy. The detail information will be described in the next section. vii Multiple-F0 estimation: the multiple fundamental frequency estimation here means to estimate the consisting of notes from candidates based on the data after the above processing. This is a core part of the whole system and we can use some mathematical methods to solve this problem. The detailed information will be described in the next section. 3. Using Harmonic Structure 3.1. Essential Knowledge The timbre refers to the feeling features of the sound, such as frequency, loudness, and the duration. The timbre has an identity function, and we can distinguish two kinds of sound with the same pitch and intensity but coming from different musical instruments according to 4 Abstract and Applied Analysis differences in sound timbre. The timbre is amultidimensional object. In addition to frequency, loudness, and duration, it also contains amplitude envelope and spectral envelope. Spectral envelope is an important parameter to describe the sound in frequency domain and it is constituted with the amplitude of all harmonics. We can describe the spectral envelope by harmonic structure, labeled asHn: Hn {a1, a2, . . . , an}, 3.1 where ak is the amplitude of the kth harmonic and n is the number of harmonics. In accordance with western musical convention, note events are ordered using a logarithmic scale 7 . For linear frequency to MIDI conversion, the following expression can be used: n 69 12 log2 ( f0 440 ) . 3.2 In music notation, each note is named with one of the following symbols: Do Re Mi Fa Sol La Ci. A sequence of notes fromDo to Ci is called an octave. In a given octave, the fundamental frequency of each note is an integer multiple of fundamental frequency of namesake from previous octaves. Since the harmonics of each note are also integer multiples of fundamental frequency, these harmonics represent namesake notes of it in next octaves 8 . For example, the fundamental frequency of La in octave 4 i.e., La4 is 220 Hz . So, the frequency of its second harmonic is 440 Hz that is equal to the fundamental frequency of La in octave 5 i.e., La5 . This is the frequency overlapping problem which we mentioned above and it is also a key problem when performing the iterative deletion. Spectral envelopes of different instruments have obvious differences, while the same kind of instrument has similar spectral envelope. Music played by the same instrument has a high degree of similarity and a stable harmonic structure. However, the computersynthesized music is established by the same soft wavetable. So we can believe that the harmonic structure of computer music synthesized by the similar instrument is almost the same. This paper assumes that the harmonic structure is unchanged when the F0 of one note is changing weakly in a semitone. 3.2. Usage of Harmonic Structure Based on the above characteristics, the information of harmonic structure can be used to improve the multiple fundamental frequency estimation algorithms. In our algorithm, the information of harmonic structure is mainly used in two places: in the training stage, obtaining the information of harmonic structure of each note played by different instrument, and in the identification stage, using the harmonic structure matching rate to determine the candidate fundamental frequency, to increase the accuracy of following multiple-F0 estimation. In the training stage, the parameters of each note can be extracted from the training materials, each training material contains only one note, and each note can be trained by 100 materials. We analyze the spectral envelope of each material and calculate the Abstract and Applied Analysis 5and Applied Analysis 5 harmonic structure Hn. We set a training threshold th1. If the normalized spectral envelope is larger than th1, the corresponding element in harmonic structure was set as the harmonic amplitude, otherwise was set to 0. For an in-harmonic instrument, partial frequencies Hn can further deviate from this frequency range. As a consequence, these partials are missed or assigned erroneously to other partials by our system 9 . However, this situation only occurs for strongly inharmonic instruments and at high values of parameter. Partial frequencies for an in-harmonic instrument can be expressed by kf0n √ 1 βk2, where β is the inharmonicity coefficient. Typical values for coefficient β range from 10−4 to 10−3 in piano bass notes 10 . Partial frequencies exceed the selected frequency range from k 25 for β 10−4, k 11 for β 5 ∗ 10−4, or k 8 for β 10−3. The fundamental frequency f0n of the MIDI note n is calculated as f0n 440 · 2 n−69 /12. 3.3 The analysis above indicates a problem that harmonics of one note may not just be at the integer multiple position of the fundamental frequency because of the inharmonicity. According to this, when we calculate Hn, we use ak instead of ak, where a ′ k is the maximum partial amplitude found in the frequency range kf0 · 2− 1/24 , kf0 · 21/24 . When there are no partials in the frequency range, a′ k is set to 0. In the identification stage, the harmonic structure matching rate can be used to determine the candidate fundamental frequency. When playing a note, the spectrumwill contain its entire harmonics. If the f0 is located at the same semitone as the note n, it will have the same harmonic structure as Hn, in which the harmonic matching ratio of the kth harmonic can be defined as the following: r ( f0, k ) ∣Y ( fk )∣∣ ak , 3.4 where Y fk is the STFT of music and ak is the k-th element inHn. In order to remove the influence of harmonic overlap in the polyphony music, the minimal harmonic matching ratio of all harmonic components is chosen as the harmonic matching ratio of this note, which just as shown in r ( f0 ) min { r ( f0, k |nk 1 } . 3.5 And, then, in order to better describe the situation of playing the notes, the strength of notes can be defined as follows:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clavision: visual automatic piano music transcription

One important problem in Music Information Retrieval is Automatic Music Transcription, which is an automated conversion process from played music to a symbolic notation such as sheet music. Since the accuracy of previous audiobased transcription systems is not satisfactory, we propose an innovative visual-based automatic music transcription system named claVision to perform piano music transcri...

متن کامل

Automatic Identification and Classification of the Iranian Traditional Music Scales (Dastgāh) and Melody Models (Gusheh): Analytical and Comparative Review on Conducted Research

Background and Aim: Automatic identification and classification of the Iranian traditional music scales (Dastgāh) and melody models (Gusheh) has attracted the attention of the researchers for more than a decade. The current research aims to review conducted researches on this area and consider its different approached and obstacles. Method: The research approach is content analysis and data col...

متن کامل

بهینه سازی مشخصه فیزیکی تالاب پیش ساخته به کمک مدل شبیه‌ساز-بهینه‌ساز MUSIC-GA

Nowadays, urbanization is a common process which replaces vegetation cover with impervious areas. This process increases urban stormwater. A new tendency in urban stormwater management endorses ‘source control’, whereby distributed water sensitive urban design systems are built throughout the subdivisions to alleviate the effects of land use changes. Various costs and functions are ...

متن کامل

Automatic notation of played music (status report)

A computer program automatically converting sounding music into written music is presented. A hardware system i s used for the identification of r e s t s and for pitch detection. A statistical treatment of the pitch periods serves a s the basis for the identification of the actual scale used in each individual melody. An analogous treatment of the durations defines the note values. The transcr...

متن کامل

A Computational Model for Multi - Instrument Music Transcription CS 229 Final Project Report , Autumn 2013

The aim of our project is to build a model for multi-instrument music transcription. Automatic music transcription is the process of converting an audio wave file into some form of music notes representations. We propose a two-step process for an automatic multiinstrument music transcription system including timbre classification and source separation using probabilistic latent component analysis.

متن کامل

Automatic Segmentation and Classification of Articulation in Monophonic Music

Automatic music transcription has attracted a great interest from computer scientists and musicians for more than thirty-five years. The process of transcription includes extraction of fundamental frequencies, and segmentation of the continuous audio signal by detecting onsets and offsets of sound events. In a following step, quantization of pitch and timing gives a symbolic representation of t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

A Combined Mathematical Treatment for a Special Automatic Music Transcription System

نویسندگان

چکیده

منابع مشابه

Clavision: visual automatic piano music transcription

Automatic Identification and Classification of the Iranian Traditional Music Scales (Dastgāh) and Melody Models (Gusheh): Analytical and Comparative Review on Conducted Research

بهینه سازی مشخصه فیزیکی تالاب پیش ساخته به کمک مدل شبیه‌ساز-بهینه‌ساز MUSIC-GA

Automatic notation of played music (status report)

A Computational Model for Multi - Instrument Music Transcription CS 229 Final Project Report , Autumn 2013

Automatic Segmentation and Classification of Articulation in Monophonic Music

عنوان ژورنال:

اشتراک گذاری